# set global knit optionsknitr::opts_chunk$set(echo = T,eval = T,error = F,warning = F,message = F,cache = T)# suppress scientific notationoptions(scipen=999)# list of required packagespackages <-c( #"SIN", # this package was removed from the CRAN repository"MASS", "dplyr", "tidyr", "purrr", "extraDistr", "ggplot2", "loo", "bridgesampling", "brms", "bayesplot", "tictoc", "hypr", "bcogsci", "papaja", "grid", "kableExtra", "gridExtra", "lme4", "cowplot", "pdftools", "cmdstanr", "rootSolve", "rstan" )# NB: if you haven't already installed bcogsci through devtools, it won't be loaded## Now load or install & load allpackage.check <-lapply( packages,FUN =function(x) {if (!require(x, character.only =TRUE)) {install.packages(x, dependencies =TRUE)library(x, character.only =TRUE) } })# this is also required, taken from the textbook## Save compiled models:rstan_options(auto_write =FALSE)## Parallelize the chains using all the cores:options(mc.cores = parallel::detectCores())# To solve some conflicts between packagesselect <- dplyr::selectextract <- rstan::extract
Set options
# set condition so that when we knit a single child Rmd we also get the references for just that section## the parent.Rmd creates 'knit_type' to 'parent'; if this is a child knit then 'knit_type' does not exist yet# if 'parent_knit' exists, then it was created in 'parent.Rmd' and this is a parent knit, so keep it unchanged, otherwise call it 'child'knit_type =ifelse(exists("knit_type"),knit_type,"chapter")
Code
# set global knit optionsknitr::opts_chunk$set(echo = T,eval = T,error = F,warning = F,message = F,cache = T)# suppress scientific notationoptions(scipen=999)# list of required packagespackages <-c( #"SIN", # this package was removed from the CRAN repository"MASS", "dplyr", "tidyr", "purrr", "extraDistr", "ggplot2", "loo", "bridgesampling", "brms", "bayesplot", "tictoc", "hypr", "bcogsci", "papaja", "grid", "kableExtra", "gridExtra", "lme4", "cowplot", "pdftools", "cmdstanr", "rootSolve", "rstan" )# NB: if you haven't already installed bcogsci through devtools, it won't be loaded## Now load or install & load allpackage.check <-lapply( packages,FUN =function(x) {if (!require(x, character.only =TRUE)) {install.packages(x, dependencies =TRUE)library(x, character.only =TRUE) } })# this is also required, taken from the textbook## Save compiled models:rstan_options(auto_write =FALSE)## Parallelize the chains using all the cores:options(mc.cores = parallel::detectCores())# To solve some conflicts between packagesselect <- dplyr::selectextract <- rstan::extract
1 Chapter 4
1.1 Exercise 4.1 - A simple linear regression: Power posing and testosterone.
Load the following data set:
data("df_powerpose")head(df_powerpose)
id hptreat female age testm1 testm2
2 29 High Male 19 38.725 62.375
3 30 Low Female 20 32.770 29.235
4 31 High Female 20 32.320 27.510
5 32 Low Female 18 17.995 28.655
7 34 Low Female 21 73.580 44.670
8 35 High Female 20 80.695 105.485
The data set, which was originally published in Carney, Cuddy, and Yap (2010) but released in modified form by Fosse (2016), shows the testosterone levels of 39 different individuals, before and after treatment, where treatment refers to each individual being assigned to a high power pose or a low power pose. In the original paper by Carney, Cuddy, and Yap (2010), the unit given for testosterone measurement (estimated from saliva samples) was picograms per milliliter (pg/ml). One picogram per milliliter is 0.001 nanogram per milliliter (ng/ml).
The research hypothesis is that on average, assigning a subject a high power pose vs. a low power pose will lead to higher testosterone levels after treatment. Assuming that you know nothing about normal ranges of testosterone using salivary measurement, choose an appropriate Cauchy prior (e.g., \(Cauchy(0,2.5)\) for the target parameter(s).
Investigate this claim using a linear model and the default priors of brms. You’ll need to estimate the effect of a new variable that encodes the change in testosterone.
1.1.1 My work
DV: change in testosterone levels (continuous, testm2-testm1)
id hptreat female age testm1 testm2 change
2 29 High Male 19 38.725 62.375 23.650002
3 30 Low Female 20 32.770 29.235 -3.534999
4 31 High Female 20 32.320 27.510 -4.810000
5 32 Low Female 18 17.995 28.655 10.660000
7 34 Low Female 21 73.580 44.670 -28.910004
8 35 High Female 20 80.695 105.485 24.790000
# eyeball DV distributionhist(df_powerpose$change)
# order factordf_powerpose$hptreat <-factor(df_powerpose$hptreat, levels =c("Low", "High"))levels(df_powerpose$hptreat)
[1] "Low" "High"
# set contrastscontrasts(df_powerpose$hptreat) <-c(-0.5,+0.5)contrasts(df_powerpose$hptreat)
[,1]
Low -0.5
High 0.5
# fit modelfit_powerpose <-brm( change ~1+ hptreat,data = df_powerpose,family =gaussian())
# explore modelfit_powerpose
Family: gaussian
Links: mu = identity; sigma = identity
Formula: change ~ 1 + hptreat
Data: df_powerpose (Number of observations: 39)
Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
total post-warmup draws = 4000
Population-Level Effects:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept -0.02 3.34 -6.77 6.49 1.00 3743 3138
hptreat1 8.82 6.72 -4.79 22.02 1.00 3717 2767
Family Specific Parameters:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma 20.55 2.39 16.58 25.81 1.00 3483 2795
Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
plot(fit_powerpose)
pp_check(fit_powerpose, ndraws=100)
1.2 Exercise 4.2 - Another linear regression model: Revisiting attentional load effect on pupil size.
Here, we revisit the analysis shown in the chapter, on how attentional load affects pupil size.
Our priors for this experiment were quite arbitrary. How do the prior predictive distributions look like? Do they make sense?
Is our posterior distribution sensitive to the priors that we selected? Perform a sensitivity analysis to find out whether the posterior is affected by our choice of prior for the \(sigma\).
Our data set includes also a column that indicates the trial number. Could it be that trial has also an effect on the pupil size? As in lm, we indicate another main effect with a + sign. How would you communicate the new results?
1.3 Exercise 4.3 - Log-normal model: Revisiting the effect of trial on finger tapping times
We continue considering the effect of trial on finger tapping times.
Estimate the slowdown in milliseconds between the last two times the subject pressed the space bar in the experiment.
How would you change your model (keeping the log-normal likelihood) so that it includes centered log-transformed trial numbers or square-root-transformed trial numbers (instead of centered trial numbers)? Does the effect in milliseconds change?
1.4 Exercise 4.4 - Logistic regression: Revisiting the effect of set size on free recall.
Our data set includes also a column coded as tested that indicates the position of the queued word. (In Figure 4.13 tested would be 3). Could it be that position also has an effect on recall accuracy? How would you incorporate this in the model? (We indicate another main effect with a + sign).
The data set is from a study (Beall and Tracy 2013) that contains information about the color of the clothing worn (red, pink, or red or pink) when the subject (female) is at risk of becoming pregnant (is ovulating, self-reported). The broader issue being investigated is whether women wear red more often when they are ovulating (in order to attract a mate). Using logistic regressions, fit three different models to investigate whether being ovulating increases the probability of wearing (a) red, (b) pink, or (c) either pink or red. Use priors that are reasonable (in your opinion).
Set options
Code
# set global knit optionsknitr::opts_chunk$set(echo = T,eval = T,error = F,warning = F,message = F,cache = T)# suppress scientific notationoptions(scipen=999)# list of required packagespackages <-c( #"SIN", # this package was removed from the CRAN repository"MASS", "dplyr", "tidyr", "purrr", "extraDistr", "ggplot2", "loo", "bridgesampling", "brms", "bayesplot", "tictoc", "hypr", "bcogsci", "papaja", "grid", "kableExtra", "gridExtra", "lme4", "cowplot", "pdftools", "cmdstanr", "rootSolve", "rstan" )# NB: if you haven't already installed bcogsci through devtools, it won't be loaded## Now load or install & load allpackage.check <-lapply( packages,FUN =function(x) {if (!require(x, character.only =TRUE)) {install.packages(x, dependencies =TRUE)library(x, character.only =TRUE) } })# this is also required, taken from the textbook## Save compiled models:rstan_options(auto_write =FALSE)## Parallelize the chains using all the cores:options(mc.cores = parallel::detectCores())# To solve some conflicts between packagesselect <- dplyr::selectextract <- rstan::extract
2 Chapter 5 - Bayesian hierachical models
2.1 Exercise 5.1 A hierarchical model (normal likelihood) of cognitive load on pupil .
As in section 4.1, we focus on the effect of cognitive load on pupil , but this time we look at all the subjects of Wahn et al. (2016):
You should be able to now fit a “maximal” model (correlated varying intercept and slopes for subjects) assuming a normal likelihood. Base your priors in the priors discussed in section 4.1.
fit_pupil <-brm(p_size ~1+ c_load + (1+c_load | subj), data = df_pupil_complete,family =gaussian(),prior =c(prior(normal(1000, 500), class = Intercept),prior(normal(0, 1000), class = sigma),prior(normal(0, 100), class = b, coef = c_load),prior(normal(0, 1000), class = sd),prior(lkj(2), class = cor) ) )
Warning: There were 6 transitions after warmup that exceeded the maximum treedepth. Increase max_treedepth above 10. See
https://mc-stan.org/misc/warnings.html#maximum-treedepth-exceeded
Warning: Examine the pairs() plot to diagnose sampling problems
Examine the effect of load on pupil size, and the average pupil size. What do you conclude?
My solution
fit_pupil
Family: gaussian
Links: mu = identity; sigma = identity
Formula: p_size ~ 1 + c_load + (1 + c_load | subj)
Data: df_pupil_complete (Number of observations: 2228)
Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
total post-warmup draws = 4000
Group-Level Effects:
~subj (Number of levels: 20)
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS
sd(Intercept) 3321.35 426.49 2553.97 4207.68 1.00 656
sd(c_load) 72.15 15.68 47.37 107.83 1.00 1158
cor(Intercept,c_load) 0.31 0.24 -0.21 0.71 1.00 1200
Tail_ESS
sd(Intercept) 906
sd(c_load) 1994
cor(Intercept,c_load) 2058
Population-Level Effects:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept 2456.93 482.09 1465.40 3359.21 1.01 659 1122
c_load 38.60 25.65 -14.75 86.89 1.00 986 1511
Family Specific Parameters:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma 505.07 7.65 490.20 520.69 1.00 5140 2878
Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
95% CrI crosses 0: quite a bit of uncertainty
alternatively, to only print c_load ::: {.cell hash=‘all_chapters_cache/html/unnamed-chunk-20_85ed5e35eaddef5e1c8bb9ddfd85823f’}
## For the hierarchical model is more complicated, # because we want the effect (beta) + adjustment: # we extract the overall group level effect:beta <-c(as_draws_df(fit_pupil_2)$b_c_load)# We extract the individual adjustmentsind_effects_v <-paste0("r_subj[", unique(df_pupil_complete$subj), ",c_load]") adjustment <-as.matrix(as_draws_df(fit_pupil)[ind_effects_v])# We get the by subject effects in a data frame where each adjustment # is in each column.by_subj_effect <-as_tibble(beta + adjustment)# We summarize them by getting a table with the mean and the# quantiles for each column and then binding them.par_h <-lapply(by_subj_effect, function(x) {tibble(Estimate =mean(x),Q2.5 =quantile(x, .025),Q97.5 =quantile(x, .975) )}) %>%bind_rows() %>%# We add a column to identify that the model, # and one with the subject labels:mutate(subj =unique(df_pupil_complete$subj)) %>%arrange(Estimate) %>%mutate(subj =factor(subj, levels = subj))ggplot(par_h,aes(ymin = Q2.5,ymax = Q97.5,x = subj,y = Estimate )) +geom_errorbar() +geom_point() +# We'll also add the mean and 95% CrI of the overall difference # to the plot:geom_hline(yintercept =posterior_summary(fit_pupil_2)["b_c_load", "Estimate"]) +geom_hline(yintercept =posterior_summary(fit_pupil_2)["b_c_load", "Q2.5"],linetype ="dotted",linewidth = .5 ) +geom_hline(yintercept =posterior_summary(fit_pupil_2)["b_c_load", "Q97.5"],linetype ="dotted",linewidth = .5 ) +coord_flip() +ylab("Change in pupil size") +xlab("Participant ID") +theme_bw()
2.2 Exercise 5.2 Are subject relatives easier to process than object relatives (log-normal likelihood)?
We begin with a classic question from the psycholinguistics literature: Are subject relatives easier to process than object relatives? The data come from Experiment 1 in a paper by Grodner and Gibson (2005).
Scientific question: Is there a subject relative advantage in reading?
Grodner and Gibson (2005) investigate an old claim in psycholinguistics that object relative clause (ORC) sentences are more difficult to process than subject relative clause (SRC) sentences. One explanation for this predicted difference is that the distance between the relative clause verb (sent in the example below) and the head noun phrase of the relative clause (reporter in the example below) is longer in ORC vs. SRC. Examples are shown below. The relative clause is shown in square brackets.
(1a) The reporter [who the photographer sent to the editor] was hoping for a good story. (ORC)
(1b) The reporter [who sent the photographer to the editor] was hoping for a good story. (SRC)
The underlying explanation has to do with memory processes: Shorter linguistic dependencies are easier to process due to either reduced interference or decay, or both. For implemented computational models that spell this point out, see Lewis and Vasishth (2005) and Engelmann, Jäger, and Vasishth (2020).
In the Grodner and Gibson data, the dependent measure is reading time at the relative clause verb, (e.g., sent) of different sentences with either ORC or SRC. The dependent variable is in milliseconds and was measured in a self-paced reading task. Self-paced reading is a task where subjects read a sentence or a short text word-by-word or phrase-by-phrase, pressing a button to get each word or phrase displayed; the preceding word disappears every time the button is pressed. In 6.1, we provide a more detailed explanation of this experimental method.
For this experiment, we are expecting longer reading times at the relative clause verbs of ORC sentences in comparison to the relative clause verb of SRC sentences.
Estimate the median difference between relative clause attachment sites in milliseconds, and report the mean and 95% CI.
My solution
alpha <-as_draws_df(fit_df_gg05_rc)$b_Interceptbeta <-as_draws_df(fit_df_gg05_rc)$b_condition1# Difference between object RC coded as .5 and subject RC coded as .5 effect <-exp(alpha + beta * .5) -exp(alpha + beta *-.5)c(mean =mean(effect), quantile(effect, c(.025,.975)))
mean 2.5% 97.5%
34.820055 1.508776 67.937606
Do a sensitivity analysis. What is the estimate of the effect (\(\beta\)) under different priors? What is the difference in milliseconds between conditions under different priors?
My solution
first let’s check out the fit of the model
pp_check(fit_df_gg05_rc, ndraws =50, type ="dens_overlay")
now by participants
ppc_dens_overlay_grouped(df_gg05_rc$RT,yrep =posterior_predict(fit_df_gg05_rc,ndraws =100 ),group = df_gg05_rc$subj) +xlab("Signal in the N400 spatiotemporal window")
we see lots of variation in estimates as a function of our priors
with the tight priors, there is a lot more uncertainty
with the regularised and wider priors the effects are a bit more similar
# these are all the intercept, i'm interested in the slope though :/ggpubr::ggarrange(pp_check(fit_df_gg05_rc_wide, type ="stat") +theme_bw() +ggtitle("Wide priors N(0,1)"),pp_check(fit_df_gg05_rc_tight, type ="stat") +theme_bw() +ggtitle("Tight priors N(0,.01)"),pp_check(fit_df_gg05_rc, type ="stat") +theme_bw() +ggtitle("Original priors N(0,.1)"),nrow =3)
2.3 Exercise 5.3 Relative clause processing in Mandarin Chinese
Load the following two data sets:
data("df_gibsonwu")data("df_gibsonwu2")
The data are taken from two experiments that investigate (inter alia) the effect of relative clause type on reading time in Chinese. The data are from Gibson and Wu (2013) and Vasishth et al. (2013) respectively. The second data set is a direct replication attempt of the Gibson and Wu (2013) experiment.
Chinese relative clauses are interesting theoretically because they are prenominal: the relative clause appears before the head noun. For example, the English relative clauses shown above would appear in the following order in Mandarin. The square brackets mark the relative clause, and REL refers to the Chinese equivalent of the English relative pronoun who.
(2a) [The photographer sent to the editor] REL the reporter was hoping for a good story. (ORC)
(2b) [sent the photographer to the editor] REL the reporter who was hoping for a good story. (SRC)
As discussed in Gibson and Wu (2013), the consequence of Chinese relative clauses being prenominal is that the distance between the verb in relative clause and the head noun is larger in subject relatives than object relatives. Hsiao and Gibson (2003) were the first to suggest that the larger distance in subject relatives leads to longer reading time at the head noun. Under this view, the prediction is that subject relatives are harder to process than object relatives. If this is true, this is interesting and surprising because in most other languages that have been studied, subject relatives are easier to process than object relatives; so Chinese will be a very unusual exception cross-linguistically.
The data provided are for the critical region (the head noun; here, reporter). The experiment method is self-paced reading, so we have reading times in milliseconds. The second data set is a direct replication attempt of the first data set, which is from Gibson and Wu (2013).
The research hypothesis is whether the difference in reading times between object and subject relative clauses is negative. For the first data set (df_gibsonwu), investigate this question by fitting two “maximal” hierarchical models (correlated varying intercept and slopes for subjects and items). The dependent variable in both models is the raw reading time in milliseconds. The first model should use the normal likelihood in the model; the second model should use the log-normal likelihood. In both models, use \(\pm 0.5\) sum coding to model the effect of relative clause type. You will need to decide on appropriate priors for the various parameters.
priors_gw_norm <-c(set_prior("normal(500, 150)", class ="Intercept"),set_prior("normal(0,500)", class ="b", coef ="type1"),set_prior("normal(0,500)", class ="sd"),set_prior("normal(0,1000)",class ="sigma"),set_prior("lkj(2)", class ="cor"))
fit model
fit_gw_norm <-brm(rt ~1+ type + (1+ type | subj) + (1+ type | item), data = df_gibsonwu,family =gaussian(),prior = priors_gw_norm)
Log-normal likelihood
set priors
priors_gw_log <-c(set_prior("normal(6, 1.5)", class ="Intercept"),set_prior("normal(0,1)", class ="b", coef ="type1"),set_prior("normal(0,1)",class ="sd"),set_prior("normal(0,1)",class ="sigma"),set_prior("lkj(2)", class ="cor"))
fit model
fit_gw_log <-brm(rt ~1+ type + (1+ type | subj) + (1+ type | item), data = df_gibsonwu,family =lognormal(),prior = priors_gw_log)
Plot the posterior predictive distributions from the two models. What is the difference in the posterior predictive distributions of the two models; and why is there a difference?
95% credible interval for estimates include 0 for both norm and log-norm distributions
log-norm model CrI is tighter, and includes smaller effect size (but includes 0)
Given the posterior predictive distributions you plotted above, why is the log-normal likelihood model better for carrying out inference and hypothesis testing?
Tip
# look at range per typedf_gibsonwu %>%group_by(type) %>%summarise(min(rt), max(rt))
# A tibble: 2 × 3
type `min(rt)` `max(rt)`
<fct> <int> <int>
1 obj-ext 172 2308
2 subj-ext 189 6217
subj-ext had extreme raw values; this would’ve biased the model with a normal distribution
Next, work out a normal approximation of the log-normal model’s posterior distribution for the relative clause effect that you obtained from the above data analysis. Then use that normal approximation as an informative prior for the slope parameter when fitting a hierarchical model to the second data set. This is an example of incrementally building up knowledge by successively using a previous study’s posterior as a prior for the next study; this is essentially equivalent to pooling both data sets (check that pooling the data and using a Normal(0,1) prior for the effect of interest, with a log-normal likelihood, gives you approximately the same posterior as the informative-prior model fit above).
Dataset 1 log estimates
first check 2nd dataset
head(df_gibsonwu2)
subj item condition pos rt region
9 1m1 15 obj-ext 8 832 head noun
20 1m1 8 subj-ext 8 2131 head noun
33 1m1 11 obj-ext 8 553 head noun
46 1m1 10 subj-ext 8 1091 head noun
62 1m1 16 subj-ext 8 598 head noun
75 1m1 14 subj-ext 8 645 head noun
remind ourselves of the estimates from the first dataset
summary(fit_gw_log)
Family: lognormal
Links: mu = identity; sigma = identity
Formula: rt ~ 1 + type + (1 + type | subj) + (1 + type | item)
Data: df_gibsonwu (Number of observations: 547)
Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
total post-warmup draws = 4000
Group-Level Effects:
~item (Number of levels: 15)
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS
sd(Intercept) 0.20 0.05 0.13 0.33 1.00 1183
sd(type1) 0.07 0.05 0.00 0.20 1.00 1569
cor(Intercept,type1) -0.00 0.42 -0.77 0.78 1.00 3868
Tail_ESS
sd(Intercept) 2054
sd(type1) 1816
cor(Intercept,type1) 2774
~subj (Number of levels: 37)
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS
sd(Intercept) 0.25 0.04 0.18 0.34 1.00 1247
sd(type1) 0.14 0.07 0.02 0.28 1.00 1142
cor(Intercept,type1) -0.46 0.30 -0.90 0.23 1.00 2456
Tail_ESS
sd(Intercept) 2073
sd(type1) 1084
cor(Intercept,type1) 2525
Population-Level Effects:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept 6.06 0.07 5.92 6.20 1.00 982 1652
type1 -0.07 0.05 -0.18 0.04 1.00 2852 2661
Family Specific Parameters:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma 0.51 0.02 0.48 0.55 1.00 3433 3015
Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
for replication study, use:
type: \(LogNormal(-0.07, 0.07)\) (type1 Estimate and Est. Error?)
Intercept: \(LogNormal(6, 0.06)\)
Dataset 2 model
priors_gw2_log <-c(set_prior("normal(6, 0.07)", class ="Intercept"),set_prior("normal(-0.7,0.06)", class ="b", coef ="type1"),set_prior("normal(0,1)",class ="sd"),set_prior("normal(0,1)",class ="sigma"),set_prior("lkj(2)", class ="cor"))
fit model; convergence warning (ESS too low)
fit_gw2_log <-brm(rt ~1+ type + (1+ type | subj) + (1+ type | item), data = df_gibsonwu2,family =lognormal(),prior = priors_gw2_log)
summary(fit_gw2_log)
Family: lognormal
Links: mu = identity; sigma = identity
Formula: rt ~ 1 + type + (1 + type | subj) + (1 + type | item)
Data: df_gibsonwu2 (Number of observations: 595)
Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
total post-warmup draws = 4000
Group-Level Effects:
~item (Number of levels: 15)
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS
sd(Intercept) 0.17 0.04 0.10 0.27 1.00 1247
sd(type1) 0.61 0.15 0.37 0.95 1.00 1252
cor(Intercept,type1) -0.28 0.32 -0.79 0.40 1.00 524
Tail_ESS
sd(Intercept) 2247
sd(type1) 1604
cor(Intercept,type1) 1206
~subj (Number of levels: 40)
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS
sd(Intercept) 0.24 0.04 0.18 0.32 1.00 1221
sd(type1) 0.11 0.07 0.01 0.24 1.00 931
cor(Intercept,type1) 0.03 0.34 -0.65 0.72 1.00 3762
Tail_ESS
sd(Intercept) 2346
sd(type1) 1521
cor(Intercept,type1) 2462
Population-Level Effects:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept 6.02 0.05 5.92 6.13 1.00 963 1489
type1 -0.62 0.06 -0.74 -0.49 1.00 3196 2232
Family Specific Parameters:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma 0.43 0.01 0.40 0.46 1.00 3846 2923
Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
priors_gw12_log <-c(set_prior("normal(6, 0.07)", class ="Intercept"),set_prior("normal(-0.7,0.06)", class ="b", coef ="type1"),set_prior("normal(0,1)",class ="sd"),set_prior("normal(0,1)",class ="sigma"),set_prior("lkj(2)", class ="cor"))
fit_gw12_log <-brm(rt ~1+ type + (1+ type | subj) + (1+ type | item), data = df_gw12,family =lognormal(),prior = priors_gw12_log)
summary(fit_gw12_log)
Family: lognormal
Links: mu = identity; sigma = identity
Formula: rt ~ 1 + type + (1 + type | subj) + (1 + type | item)
Data: df_gw12 (Number of observations: 1142)
Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
total post-warmup draws = 4000
Group-Level Effects:
~item (Number of levels: 30)
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS
sd(Intercept) 0.18 0.03 0.13 0.24 1.00 1514
sd(type1) 0.44 0.10 0.25 0.66 1.01 751
cor(Intercept,type1) -0.15 0.28 -0.66 0.39 1.01 434
Tail_ESS
sd(Intercept) 2524
sd(type1) 1537
cor(Intercept,type1) 922
~subj (Number of levels: 77)
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS
sd(Intercept) 0.24 0.03 0.20 0.30 1.00 1524
sd(type1) 0.12 0.05 0.02 0.22 1.00 881
cor(Intercept,type1) -0.34 0.27 -0.81 0.26 1.00 2971
Tail_ESS
sd(Intercept) 2060
sd(type1) 1251
cor(Intercept,type1) 2628
Population-Level Effects:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept 6.05 0.05 5.95 6.14 1.01 771 1433
type1 -0.49 0.07 -0.63 -0.34 1.00 1267 2130
Family Specific Parameters:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma 0.47 0.01 0.45 0.49 1.00 4625 2905
Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
priors_gw12_log_reg <-c(set_prior("normal(6, 1.5)", class ="Intercept"),set_prior("normal(0,1)", class ="b", coef ="type1"),set_prior("normal(0,1)",class ="sd"),set_prior("normal(0,1)",class ="sigma"),set_prior("lkj(2)", class ="cor"))
fit_gw12_log_reg <-brm(rt ~1+ type + (1+ type | subj) + (1+ type | item), data = df_gw12,family =lognormal(),prior = priors_gw12_log_reg)
summary(fit_gw12_log_reg)
Family: lognormal
Links: mu = identity; sigma = identity
Formula: rt ~ 1 + type + (1 + type | subj) + (1 + type | item)
Data: df_gw12 (Number of observations: 1142)
Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
total post-warmup draws = 4000
Group-Level Effects:
~item (Number of levels: 30)
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS
sd(Intercept) 0.17 0.03 0.13 0.24 1.00 1516
sd(type1) 0.11 0.05 0.01 0.20 1.00 972
cor(Intercept,type1) -0.28 0.31 -0.80 0.37 1.00 3205
Tail_ESS
sd(Intercept) 2133
sd(type1) 1099
cor(Intercept,type1) 1848
~subj (Number of levels: 77)
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS
sd(Intercept) 0.25 0.03 0.20 0.30 1.00 1415
sd(type1) 0.11 0.05 0.01 0.20 1.00 1150
cor(Intercept,type1) -0.34 0.30 -0.85 0.33 1.00 3502
Tail_ESS
sd(Intercept) 2205
sd(type1) 1421
cor(Intercept,type1) 2562
Population-Level Effects:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept 6.03 0.04 5.95 6.12 1.00 1690 2214
type1 -0.08 0.04 -0.15 -0.01 1.00 3981 3214
Family Specific Parameters:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma 0.47 0.01 0.45 0.49 1.00 4353 2865
Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
prior_summary(fit_gw12_log_reg)
prior class coef group resp dpar nlpar lb ub
(flat) b
normal(0,1) b type1
normal(6, 1.5) Intercept
lkj_corr_cholesky(2) L
lkj_corr_cholesky(2) L item
lkj_corr_cholesky(2) L subj
normal(0,1) sd 0
normal(0,1) sd item 0
normal(0,1) sd Intercept item 0
normal(0,1) sd type1 item 0
normal(0,1) sd subj 0
normal(0,1) sd Intercept subj 0
normal(0,1) sd type1 subj 0
normal(0,1) sigma 0
source
default
user
user
user
(vectorized)
(vectorized)
user
(vectorized)
(vectorized)
(vectorized)
(vectorized)
(vectorized)
(vectorized)
user
The data are taken from an experiment that investigate (inter alia) the effect of number similarity between a noun and the auxiliary verb in sentences like the following. There are two levels to a factor called Int(erference): low and high.
(3a) low: The key to the cabinet are on the table (3b) high: The key to the cabinets are on the table
Here, in (3b), the auxiliary verb are is predicted to be read faster than in (3a), because the plural marking on the noun cabinets leads the reader to think that the sentence is grammatical. (Both sentences are ungrammatical.) This phenomenon, where the high condition is read faster than the low condition, is called agreement attraction.
The data provided are for the critical region (the auxiliary verb are). The experiment method is eye-tracking; we have total reading times in milliseconds.
The research question is whether the difference in reading times between high and low conditions is negative.
First, using a log-normal likelihood, fit a hierarchical model with correlated varying intercept and slopes for subjects and items. You will need to decide on the priors for the model.
By simply looking at the posterior distribution of the slope parameter, \(\beta\), what would you conclude about the theoretical claim relating to agreement attraction?
2.5 Exercise 5.5
2.6 Exercise 5.6
2.7 Exercise 5.7
2.8 Exercise 5.8
2.9 Quarto
Quarto enables you to weave together content and executable code into a finished presentation. To learn more about Quarto presentations see https://quarto.org/docs/presentations/.
2.10 Bullets
When you click the Render button a document will be generated that includes:
Content authored with markdown
Output from executable code
2.11 Code
When you click the Render button a presentation will be generated that includes both content and the output of embedded code. You can embed code like this: